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1 Abstract 

Mh ' 

' The n-step delayed sharing information structure is investigated. This information structure 

^vq comprises of K controllers that share their information with a delay of n time steps. This 

information structure is a link between the classical information structure, where information is 
shared perfectly between the controllers, and a non-classical information structure, where there is 
no "lateral" sharing of information among the controllers. Structural results for optimal control 
strategies for systems with such information structures are presented. A sequential methodology 
for finding the optimal strategies is also derived. The solution approach provides an insight for 
identifying structural results and sequential decomposition for general decentralized stochastic 
control problems. 
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1. Introduction 
1.1. Motivation 

One of the difficulties in optimal design of decentralized control systems is handling the increase of 



> 
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data at the control stations with time. This increase in data means that the domain of control laws 
increases with time which, in turn, creates two difficulties. Firstly, the number of control strategies 
increases doubly exponentially with time; this makes it harder to search for an optimal strategy. 
Secondly, even if an optimal strategy is found, implementing functions with time increasing domain 
is difficult. 

In centralized stochastic control [lj, these difficulties can be circumvented by using the conditional 
probability of the state given the data available at the control station as a sufficient statistic 
(where the data available to a control station comprises of all observations and control actions 
till the current time) . This conditional probability, called information state, takes values in a 
time-invariant space. Consequently, we can restrict attention to control laws with time-invariant 
domain. Such results, in which data that is increasing with time is "compressed" to a sufficient 
statistic taking values in a time-invariant space, are called structural results. While the information 
state and structural result for centralized stochastic control problems are well known, no general 
methodology to find such information states or structural results exists for decentralized stochastic 
control problems. 

The structural results in centralized stochastic control are related to the concept of separation. 
In centralized stochastic control, the information state, which is conditional probability of the state 
given all the available data, does not depend on the control strategy (which is the collection of 
control laws used at different time instants). This has been called a one-way separation between 
estimation and control. An important consequence of this separation is that for any given choice of 
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control laws till time t — 1 and a given realization of the system variables till time t, the information 
states at future times do not depend on the choice of the control law at time t but only on the 
realization of control action at time t. Thus, the future information states are separated from the 
choice of the current control law. This fact is crucial for the formulation of the classical dynamic 
program where at each step the optimization problem is to find the best control action for a 
given realization of the information state. No analogous separation results are known for general 
decentralized systems. 

In this paper, we find structural results for decentralized control systems with delayed sharing 
information structures. In a system with n-step delayed sharing, every control station knows 
the n-step prior observations and control actions of all other control stations. This information 
structure, proposed by Witsenhausen in [2j, is a link between the classical information structures, 
where information is shared perfectly among the controllers, and the non-classical information 
structures, where there is no "lateral" sharing of information among the controllers. In his seminal 
paper |2], Witsenhausen asserted a structural result for this model without any proof. Varaiya and 
Walrand [3] proved that Witsenhausen's assertion was true for n = 1 but false for n > 1. For n > 1, 
Kurtaran [4J proposed another structural result. However, Kurtaran proved his result only for the 
terminal time step (that is, the last time step in a finite horizon problem); for non-terminal time 
steps, he gave an abbreviated argument, which we believe is incomplete. (The details are given in 
Section [5] of the paper). 

We prove two structural results of the optimal control laws for the delayed sharing information 
structure. We compare our results to those conjectured by Witsenhausen and show that our 
structural results for n-step delay sharing information structure simplify to that of Witsenhausen 
for n = 1; for n > 1, our results are different from the result proposed by Kurtaran. 

Our structural results do not have the separated nature of centralized stochastic control: for any 
given realization of the system variables till time t, the realization of information states at future 
times depend on the choice of the control law at time t. However, our second structural result 
shows that this dependence only propagates to the next n — 1 time steps. Thus, the information 
states from time t + n — 1 onwards are separated from the choice of control laws before time t; 
they only depend on the realization of control actions at time t. We call this a delayed separation 
between information states and control laws. 

The absence of classical separation rules out the possibility of a classical dynamic program to 
find the optimum control laws. However, optimal control laws can still be found in a sequential 
manner. Based on the two structural results, we present two sequential methodologies to find 
optimal control laws. Unlike classical dynamic programs, each step in our sequential decomposition 
involves optimization over a space of functions instead of the space of control actions. 

1.2. Notation 

Random variables are denoted by upper case letters; their realization by the corresponding lower 
case letter. X a -b is a short hand for the vector (X a ,X a +i, . . . , Xf,) while X c:d is a short hand 
for the vector (X c , X c+1 , . . . , X d ). The combined notation X^i is a short hand for the vector 
(Xf : i = a, a + 1, . . . , 6, j = c, c + 1, . . . , d). P (•) is the probability of an event, E {•} is the 
expectation of a random variable. For a collection of functions g, we use P 9 (•) and E 9 {•} to 
denote that the probability measure/expectation depends on the choice of functions in g .1a( - ) is 
the indicator function of a set A. For singleton sets {a}, we also denote lr a \(-) by l a (")- For a finite 
set A, V {^4} denotes the space of probability mass functions on A. For convenience of exposition, 
we will assume all sets have finite cardinality. 
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1.3. Model 



Consider a system consisting of a plant and K controllers with decentralized information. At time 
t, t = 1, . . . , T, the state of the plant X t takes values in X\ the control action U k at station k, 
k = 1, . . . , K, takes values in U k . The initial state Xq of the plant is a random variable. With time, 
the plant evolves according to 

X t = f t (X t ^,Ul- K ,Vt) (1) 

where Vf is a random variable taking values in V. {Vt\ t = 1, . . . , T} is a sequence of independent 
random variables that are also independent of Xq. 

The system has K observation posts. At time t, t = 1,...,T, the observation Y k of post k, 
k = 1, . . . , K, takes values in y k . These observations are generated according to 

Y k = h k {X t ^W k ) (2) 

where W k are random variables taking values in W k . {W k ;t = 1, . . . ,T;k = 1,..., K} are inde- 
pendent random variables that are also independent of Xq and {Vt; t = 1, . . . , T}. 

The system has n-step delayed sharing. This means that at time t, control station k observes 
the current observation Y k of observation post k, the n steps old observations Yp~ of all posts, 
and the n steps old actions U^i 1 ^ of all stations. Each station has perfect recall; so, it remembers 
everything that it has seen and done in the past. Thus, at time t, data available at station k can 
be written as (At, A^), where 

At:=(l^ n ,[/i£ n ) 

is the data known to all stations and 

A t ■— (Y t _ n+1:t ,Uj°_ n+ i. t _ 1 ) 

is the additional data known at station k, k = 1,...,K. Let T>t be the space of all possible 
realizations of At; and C k be the space of all possible realizations of A^. Station k chooses action 
U k according to a control law g k , i.e., 

U k = g k (A h ,A t ). (3) 

The choice of g = {g k ; k = 1, . . . , K; t = 1, . . . , T} is called a design or a control strategy. Q 
denotes the class of all possible designs. At time t, a cost q ( X t , [//,..., Uf*) is incurred. The 
performance J[g) of a design is given by the expected total cost under it, i.e., 

J{g)=w\j^c t (X u U}- K )^ (4) 

where the expectation is with respect to the joint measure on all the system variables induced by 
the choice of g. We consider the following problem. 

Problem 1 Given the statistics of the primitive random variables Xq, {Vt]t = 1,... ,T}, {W k ; 
k = 1, . . . , K; t = 1, . . . , T}, the plant functions {ft',t = 1, . . . , T}, the observation functions {h k ; 
k = 1, . . . , K; t = 1, . . . , T}, and the cost functions {ct]t = 1, . . . , T} choose a design g* from Q 
that minimizes the expected cost given by (j4j). 
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1.4. The structural results 

Witsenhausen [2] asserted the following structural result for Problem [TJ 

Structural Result (Witsenhausen |2j)) In Problem [7J without loss of optimality we can re- 
strict attention to control strategies of the form 

U k =g k (A k ,V(X t - n \A t )). (5) 

Witsenhausen's result claims that all control stations can "compress" the common information 
At to a sufficient statistic P (Xt- n \ At). Unlike At, the size of P (Xt- n | At) does not increase with 
time. 

As mentioned earlier, Witsenhausen asserted this result without a proof. Varaiya and Walrand [3] 
proved that the above separation result is true for n = 1 but false for n > 1. Kurtaran [4J proposed 
an alternate structural result for n > 1. 

Structural Result (Kurtaran |4j) In Problem^ without loss of optimality we can restrict at- 
tention to control strategies of the form 

U t k = g k t (Y t i n+1:t , P"i " ^ (Xt-n, Ut K nW | At) ) . (6) 

Kurtaran used a different labeling of the time indices, so the statement of the result in his paper is 
slightly different from what we have stated above. Kurtaran's result claims that all control stations 
can "compress" the common information At to a sufficient statistic J> 9 i-t-i (Xt_ n , Ul^ l+1 . t _ 1 | At), 
whose size does not increase with time. 

Kurtaran proved his result for only the terminal time-step and gave an abbreviated argument 
for non-terminal time-steps. We believe that his proof is incomplete for reasons that we will point 
out in Section [5j In this paper, we prove two alternative structural results. 

First Structural Result (this paper) In Problem \^ without loss of optimality we can restrict 
attention to control strategies of the form 

U t k = g k (A k M ] ^ (Xt-i,A l t :K | At)). (7) 

This result claims that all control stations can "compress" the common information At to a 
sufficient statistic P 9 i:*-i (Xt-i, A\' K | At), whose size does not increase with time. 

Second Structural Result (this paper) In Problem^ without loss of optimality we can restrict 
attention to control strategies of the form 

U k = g k {A k ,F(X t ^ n \A t ),rt- K ). (8) 

where r\ is a collection of partial functions of the previous n — 1 control laws of each controller, 

r t := {(5m('» ^m—n+l:t-n' ^m—n+l:t-n' ^m),t — n + 1 < m < t — 1} , 

for k = 1,2, ... ,K. Observe that r\ depends only on the previous n — 1 control laws (g k _ n+ i. t _i) 
and the realization of At (which consists of Y^^ n , XJ\:^_ n ). This result claims that the belief 
P (Xt-n | At) and the realization of the partial functions r\' K form a sufficient representation of At 
in order to optimally select the control action at time t. 



4 



Our structural results cannot be derived from Kurtaran's result and vice-versa. At present, we 
are not sure of the correctness of Kurtaran's result. As we mentioned before, we believe that 
the proof given by Kurtaran is incomplete. We have not been able to complete Kurtaran's proof; 
neither have we been able to find a counterexample to his result. 

Kurtaran's and our structural results differ from those asserted by Witsenhausen in a fundamental 
way. The sufficient statistic (also called information state) P (Xt- n \ At) of Witsenhausen's assertion 
does not depend on the control strategy. The sufficient statistics T 91 -*- 1 (Xt- n , ^l^+i^-i | A t ) of 
Kurtaran's result and P 9 i;*-i (Xt—i, A^ :A ' | A^) of our first result depend on the control laws used 
before time t. Thus, for a given realization of the primitive random variables till time t, the 
realization of future information states depend on the choice of control laws at time t. On the 
other hand, in our second structural result, the belief P (Xt- n | At) is indeed independent of the 
control strategy, however information about the previous n — 1 control laws is still needed in the 
form of the partial functions r\ . Since the partial functions r\' K do not depend on control laws 
used before time t — n + 1, we conclude that the information state at time t is separated from the 
choice of control laws before time t — n+1. We call this a delayed separation between information 
states and control laws. 

The rest of this paper is organized as follows. We prove our first structural result in Section [2j 
Then, in Section [3] we derive our second structural result. We discuss a special case of delayed 
sharing information structures in Section 01 We discuss Kurtaran's structural result in Section [5] 
and conclude in Section [H 

2. Proof of the first structural result 

In this section, we prove the structural result ([7]) for optimal strategies of the K control stations. 
For the ease of notation, we first prove the result for K = 2, and then show how to extend it for 
general K. 

2.1. Two Controller system (K = 2) 

The proof for K = 2 proceeds as follows: 

1. First, we formulate a centralized stochastic control problem from the point of view of a coor- 
dinator who observes the shared information At, but does not observe the private information 
(A^, A^) of the two controllers. 

2. Next, we argue that any strategy for the coordinator's problem can be implemented in the 
original problem and vice versa. Hence, the two problems are equivalent. 

3. Then, we identify states sufficient for input-output mapping for the coordinator's problem. 

4. Finally, we transform the coordinator's problem into a MDP (Markov decision process), and 
obtain a structural result for the coordinator's problem. This structural result is also a 
structural result for the delayed sharing information strucutres due to the equivalence between 
the two problems. 

Below, we elaborate on each of these stages. 
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Stage 1 



We consider the following modified problem. In the model described in Section 11.31 in addition 
to the two controllers, a coordinator that knows the common (shared) information A t available to 
both controllers at time t is present. At time t, the coordinator decides the partial functions 

7f fc :C k ^U k 

for each controller k, k = 1,2. The choice of the partial functions at time t is based on the 
realization of the common (shared) information and the partial functions selected before time t. 
These functions map each controller's private information A k to its control action U k at time t. 
The coordinator then informs all controllers of all the partial functions it selected at time t. Each 
controller then uses its assigned partial function to generate a control action as follows. 

U t k = 7t fc (A*). (9) 

The system dynamics and the cost are same as in the original problem. At next time step, the 
coordinator observes the new common observation 

Zt+1 := {Yt- n+ i, Yt-n+li Ut- n +x, U?_ n+ i}. (10) 

Thus at the next time, the coordinator knows At+x = Zt+x U At and its choice of all past par- 
tial functions and it selects the next partial functions for each controller. The system proceeds 
sequentially in this manner until time horizon T. 

In the above formulation, the only decision maker is the coordinator: the individual controllers 
simply carry out the necessary evaluations prescribed by (jSJ). At time t, the coordinator knows the 
common (shared) information At and all past partial functions Jx-t-X an d 7x-.t—l- The coordinator 
uses a decision rule ipt to map this information to its decision, that is, 

(7 t \7 t 2 ) = V>t(A t ,7iVi,7iVi), (11) 

or equivalently, 

7t fe = V t fc (A 4 ,7L-i,7iVi), k = 1,2. (12) 

The choice of iff = {ifit)t = 1, ... ,T} is called a coordination strategy. ^ denotes the class of 
all possible coordination strategies. The performance of a coordinating strategy is given by the 
expected total cost under that strategy, that is, 

J(^) = ^ij2ct(X t ,U t \U^ (13) 

where the expectation is with respect to the joint measure on all the system variables induced by 
the choice of if). The coordinator has to solve the following optimization problem. 

Problem 2 (The Coordinator's Optimization Problem) Given the system model of Prob- 
lemUl choose a coordination strategy ip* from \I/ that minimizes the expected cost given by (|13p . 

Stage 2 

We now show that the Problem [2] is equivalent to Problem [TJ Specifically, we will show that any 
design g for Problem [1] can be implemented by the coordinator in Problem [2] with the same value of 
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the problem objective. Conversely, any coordination strategy rj) in Problem [2] can be implemented 
in Problem [1] with the same value of the performance objective. 

Any design g for Problem QJ can be implemented by the coordinator in Problem [2] as follows. At 
time t the coordinator selects partial functions (7* , 7I ) using the common (shared) information St 
as follows. 

^(■)=9t(;St)=--4(S t ), k = l,2. (14) 

Consider Problems QJ and [2j Use design g in Problem QJ and coordination strategy xj) given 
by (114ft in Problem [2j Fix a specific realization of the initial state Xq, the plant disturbance 
{Vt;t = 1,...,T}, and the observation noise {W/, W t 2 ; t = 1, . . . , T}. Then, the choice of ip 
according to (j!4j) implies that the realization of the state {Xt,t = 1,...,T}, the observations 
{Yt 1 , Y t 2 ; t = 1, . . . , T}, and the control actions {U£, U?; t = 1, . . . , T} are identical in Problem QJ 
and [2j Thus, any design g for Problem [1] can be implemented by the coordinator in Problem [2] by 
using a coordination strategy given by (I14D and the total expected cost under g in Problem QJ is 
same as the total expected cost under the coordination strategy given by (|14p in Problem [2j 

By a similar argument, any coordination strategy x\) for Problem [2] can be implemented by the 
control stations in Problem QJ as follows. At time 1, both stations know S±; so, all of them can 
compute = ^{(^i), = ipf(5x). Then station k chooses action u\ = 7i(Ai). Thus, 

g k 1 (X k 1 ,S 1 )=^ k 1 (S 1 )(X k 1 ), k = 1,2. (15a) 

At time 2, both stations know 62 and 7i,7i, so both of them can compute 7^ = ip k (62, 7* , 7i), 
k = 1,2. Then station k chooses action u k = 72 (Ag). Thus, 

52 fc (A^,5 2 )=V 2 fc (^,7i 1 ,7i 2 )(A^), k = 1,2. (15b) 

Proceeding this way, at time t both stations know St and 7i :t _! and J 2 :t -i, so both of them can 
compute (7i;t)7i;t) = ^t(^t, Jut-ii 7i:t-i)- Then, station A; chooses action nf"' = 7t (A^). Thus, 

5t fc (A, fc ,5 t ) = ^(^,7iVi,7iVi)(A t fe ), k = 1,2. (15c) 

Now consider Problems [2] and [TJ Use coordinator strategy tjj in Problem [2] and design g given 
by (|15p in Problem [TJ Fix a specific realization of the initial state Xq, the plant disturbance {Vt, 
t = 1, . . . , T}, and the observation noise {W^, W 2 ;t = 1, . . . , T}. Then, the choice of g according 
to (fT5j) implies that the realization of the state {X t ;t = 1,...,T}, the observations {Y^,Y 2 ; 
t = 1,...,T}, and the control actions {U^,U 2 ;t = 1,...,T} are identical in Problem [2] and [TJ 
Hence, any coordination strategy ■?/> for Problem [2] can be implemented by the stations in Problem [TJ 
by using a design given by (fT5j) and the total expected cost under tp in Problem [2] is same as the 
total expected cost under the design given by (fl~5j) in Problem [TJ 

Since Problems [TJ and [2] are equivalent, we derive structural results for the latter problem. Unlike, 
Problem [JJ where we have multiple control stations, the coordinator is the only decision maker in 
Problem [2j 

Stage 3 

We now look at Problem [2] as a controlled input-output system from the point of view of the coor- 
dinator and identify a state sufficient for input-output mapping. From the coordinator's viewpoint, 
the input at time t has two components: a stochastic input that consists of the plant disturbance Vt 
and observation noises W} , W 2 ; and a controlled input that consists of the partial functions ^lilt- 
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The output is the observations Z t +\ given by (fTU|h The cost is given by ct(X t , U^, Uf). We want 
to identify a state sufficient for input-output mapping for this system. 

A variable is a state sufficient for input output mapping of a control system if it satisfies the 
following properties (see [5]). 

PI) The next state is a function of the current state and the current inputs. 

P2) The current output is function of the current state and the current inputs. 

P3) The instantaneous cost is a function of the current state, the current control inputs, and the 
next state. 

We claim that such a state for Problem [2] is the following. 
Definition 1 For each t define 

S t A 1 , A?) (16) 

□ 

Next we show that St, t= 1, 2, . . . , T + 1, satisfy properties (P1)-(P3). Specifically, we have the 
following. 

Proposition 1 

1. There exist functions ft, t = 2, . . . , T such that 

St+i = ft+i(St,V t ,Wt+i,W? +l ,ri, 7 t). (17) 

2. There exist functions ht, t = 2, . . . , T such that 

Z t = ht(St-i). (18) 

3. There exist functions c\, t = 1, . . . , T such that 

c t (X u UlUt) = ct{SullllS t+ i). (19) 

□ 

Proof Part 1 is an immediate consequence of the definitions of St and A^, the dynamics of the 
system given by JTJ , and the evaluations carried out by the control stations according to ([9]) . Part 2 
is an immediate consequence of the definitions of state St, observation Zt, and private information 
K k t . Part 3 is an immediate consequence of the definition of state and the evaluations carried out 
by the control stations according to ([!]). ■ 

Stage 4 

Proposition [1] establishes St as the state sufficient for input-output mapping for the coordinator's 
problem. We now define information states for the coordinator. 

Definition 2 (Information States) For a coordination strategy ij), define information states lit 
as 

H t ( St ) := (St = st I &tA:t-l,ll.t-l) ■ (20) 

□ 
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As shown in Proposition [fl the state evolution of St depends on the controlled inputs (7/, 7 2 ) and 
the random noise (V t ,W£ +1 ,W? +1 ). This random noise is independent across time. Consequently, 
Ilf evolves in a controlled Markovian manner as below. 

Proposition 2 For t = 1, ... ,T — 1, there exists functions Ft (which do not depend on the coor- 
dinator's strategy) such that 

U t+1 =F t+1 (U t , 7 l^,Z t+1 ). (21) 

□ 

PROOF See Appendix S ■ 

At t = 1, since there is no shared information, 111 is simply the unconditional probability P (Si) = 
P (Xq, Y"] 1 , Y] 2 ). Thus, IIi is fixed a priori from the joint distribution of the primitive random 
variables and does not depend on the choice of coordinator's strategy ip. Proposition [2] shows 
that at t = 2, . . . ,T, Ht depends on the strategy ijj only through the choices of 7i :t _ 1 and 7i :t _ 1 . 
Moreover, as shown in Proposition [H the instantaneous cost at time t can be written in terms of 
the current and next states (St,St+i) and the control inputs (7i,7 2 )- Combining the above two 
properties, we get the following: 

Proposition 3 The process lit, t = 1,2, ...,T is a controlled Markov chain with 7t,7 2 as the 
control actions at time t, i.e., 

P (Il m I A t , ni :tj 7 L, 7?:t) = P | , 7l:i , ll. t) = P (^+1 | n tj 7^7?) ■ (22) 

Furthermore, there exists a deterministic function Ct such that 

E{^(S t ,7 t \ 7 2,5 m )|A t ,n 1:t ,7 1 1 :t ,7 1 2 :t }=Q(n t ,7 1 1 ,7 t 2 ). (23) 

□ 

PROOF See Appendix [Bj ■ 

The controlled Markov property of the process {Ht,t = 1, ... ,T} immediately gives rise to the 
following structural result. 

Theorem 1 In Problem [H without loss of optimality we can restrict attention to coordination 
strategies of the form 

(7t,7t 2 ) = ^(nt), t = l,...,T. (24) 

□ 

Proof From Proposition [31 we conclude that the optimization problem for the coordinator is to 
control the evolution of the controlled Markov process {IT, i = 1,2,..., T} by selecting the partial 
functions {7t, 7 2 , t = 1,2,..., T} in order to minimize Ylt=i ^ {Ct(Ht, 7* , 7 2 )}- This is an instance 
of the well-known Markov decision problems where it is known that the optimal strategy is a 
function of the current state. Thus, the structural result follows from Markov decision theory PQ.b 

The above result can also be stated in terms of the original problem. 
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Theorem 2 (Structural Result) In Problem^ with K = 2, without loss of optimality we can 
restrict attention to coordination strategies of the form 

U? = g*(AtlL t ), k = 1,2. (25) 

where 

U t = P^-i^L-i) Aj, A? | A t ) (26) 

where II i = P (Ag, Y^,Y^) and for t = 2, . . . , T , IT is evaluated as follows: 

n m = F t+1 (Ut,gj(;U t ), <$(-, n t ),Z t+1 ) (27) 



□ 



Proof Theorem [T] established the structure of the optimal coordination strategy. As we argued 
in Stage 2, this optimal coordination strategy can be implemented in Problem [1] and is optimal for 
the objective (|4|). At t = 1, IIi = P (Xq, Yj 1 , Y± ) is known to both controllers and they can use the 
optimal coordination strategy to select partial functions according to: 

(7i\7i 2 ) = V>i(ni) 

Thus, 

^ = 7i fc (A?) = ^(ni)(A^)=: 9 f(A^n 1 ), k = 1,2. (28) 

At time instant t+1, both controllers know ilj and the common observations Zt+i = {Y^_ n+l ,Y^_ n+l , 
^t-n+l^t-n+l)'i they use the partial functions (gj (■ ,TLf) , g% (■ ,ILt)) i n equation (f2T|) to evaluate 
The control actions at time t + 1 are given as: 

U t k +1 = Tti(Ati) = ^ + i(ni +1 )(A, fc +1 ) 

=: g*(A*Il t+l ), k = l,2. (29) 



Moreover, using the design g defined according to (|29p . the coordinator's information state lit can 
also be written as: 

lb = P^ (AV 1; A], A 2 t | &ullt-i,"fl.t-i) 

= P^ (X t _ 1 ,Al,A*\A t ,g 1 *(;Il 1 ),...,g¥ 1 (;ILt-i)) 

= p^L-Lfli.-t-i) (Xt-i,A],A% | A t ) (30) 



where we dropped the partial functions from the conditioning terms in (|30p because under the given 
control laws (<7i : t_i>ffi:t_i), the partial functions used from time 1 to t — 1 can be evaluated from 
At (by using Proposition [2] to evaluate IIi : t_x)- ■ 

Theorem [2] establishes the first structural result stated in Section 11.41 for K = 2. In the next 
section, we show how to extend the result for general K. 

2.2. Extension to General K 

Theorem [2] for two controllers (K = 2) can be easily extended to general K by following the same 
sequence of arguments as in stages 1 to 4 above. Thus, at time t, the coordinator introduced in 
Stage 1 now selects partial functions 7^ : C k 1— >• U k , for k = 1,2, . . . , K. The state sufficient for 
input output mapping from the coordinator's perspective is given as St '■= (X t -\, Aj :K ) and the 
information state lit for the coordinator is 

Ilt(st):=I> 1p {St = st\At,~ ( 1 1 :« 1 ). (31) 

Results analogous to Propositions [THS] can now be used to conclude the structural result of Theo- 
rem [2] for general K. 
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2.3. Sequential Decomposition 

In addition to obtaining the structural result of Theorem [21 the coordinator's problem also allows 
us to write a dynamic program for finding the optimal control strategies as shown below. We first 
focus on the two controller case (K = 2) and then extend the result to general K. 

Theorem 3 The optimal coordination strategy can be found by the following dynamic program: 
For t = 1, . . . , T , define the functions J t ■ V {S} i-> II as follows. For ir G V {S} let 

Jt{tt) = jpf E {C T {U T , 7t, 7t) \ = tt, ^ = f, ^ = f} . (32) 

For t = 1, . . . , T - 1, and vr G V {S} let 

J t (n) = hif E{C t (II t> 7tW) + ^+i(n m ) I n t = vr,^ 1 = f^f = 7 2 } . (33) 

TTte an? inf >7t' 2 ) w i/ie i?//5" of Jt(ir) is the optimal action for the coordinator at time t then 

Il t = 7T. T/lMS ; 

(7t'\7i' 2 ) = 0*(7Tt) 

T/ie corresponding control strategy for Problem^ given by (|15p is optimal for Problem^ □ 

Proof As in Theorem[Tl we use the fact that the coordinator's optimization problem can be viewed 
as a Markov decision problem with IT as the state of the Markov process. The dynamic program 
follows from standard results in Markov decision theory [I]. The optimality of the corresponding 
control strategy for Problem [T] follows from the equivalence between the two problems. ■ 

The dynamic program of Theorem [3] can be extended to general K in a manned similar to 
Section O 

2.4. Computational Aspects 

In the dynamic program for the coordinator in Theorem [3l the value functions at each time are 
functions defined on the continuous space V {S}, whereas the minimization at each time step is over 
the finite set of functions from the space of realizations of the private information of controllers 
(C k , k = 1,2) to the space of control actions (U k , k = 1,2). While dynamic programs with 
continuous state space can be hard to solve, we note that our dynamic program resembles the 
dynamic program for partially observable Markov decision problems (POMDP). In particular, just 
as in POMDP, the value-function at time T is piecewise linear in ilj- and by standard backward 
recursion, it can be shown that value-function at time t is piecewise linear and concave function 
of Uf. (See Appendix [C]). Indeed, the coordinator's problem can be viewed as a POMDP, with 
St as the underlying partially observed state and the belief IT as the information state of the 
POMDP. The characterization of value functions as piecewise linear and concave is utilized to find 
computationally efficient algorithms for POMDPs. Such algorithmic solutions to general POMDPs 
are well-studied and can be employed here. We refer the reader to [6] and references therein for a 
review of algorithms to solve POMDPs. 

2.5. One-step Delay 

We now focus on the one-step delayed sharing information structure, i.e., when n = 1. For this case, 
the structural result ([!]) asserted by Witsenhausen is correct [3|. At first glance, that structural 
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result looks different from our structural result ([7]) for n = 1. In this section, we show that for 
n = 1, these two structural results are equivalent. 

As before, we consider the two-controller system (K = 2). When delay n = 1, we have 

At = (^l:t-l,^nt-l, #l:t-l) ^l:t-l)> 

A, 1 = (y t 1 ), A 2 = (Y 2 ), 

and 

Z t+1 = (Y t \Y*,U},U?). 

The result of Theorem [2] can now be restated for this case as follows: 

Corollary 1 In Problem [7] with K = 2 and n = 1, without loss of optimality we can restrict 
attention to control strategies of the form: 

U t k = gUY t k ,U t ), k = 1,2. (34) 

where 

U t := p(*\*-i.ri:*-i) (X t ^,Y t \Y t 2 | A t ) (35) 

□ 

We can now compare our result for one-step delay with the structural result ([5]), asserted in [2] 
and proved in [3]. For n = 1, this result state that without loss of optimality, we can restrict 
attention to control laws of the form: 

tf* = g*(y t fc ,P(X t _i|A t )), k = 1,2. (36) 

The above structural result can be recovered from (|35p by observing that there is a one-to-one 
correspondence between IT and the belief P (Xt-i | At). We first note that 

nt = p(ffi:«-i^ !t _i) ( Xt ^,Y t \Y t 2 1 A t ) 

= P [Y t x I Xt_i) • P (Y t 2 | Xt_i) • ptoi:t-i.f?:*-i) (Xt_! | At) (37) 

As pointed out in [2j[3] (and proved later in this paper in Proposition H|), the last probability does 
not depend on the functions (<7i : t_i, Qi.t-i)- Therefore, 

n t = P (Y/ | AVi) • P (Y t 2 | X t _i) • P (X t -i | At) (38) 

Clearly, the belief P (Af_i | At) is a marginal of IT and therefore can be evaluated from lit. More- 
over, given the belief P {X t -\ \ At), one can evaluate lit using equation (j38|) . This one-to-one 
correspondence between lit and P (Xt-i \ At) means that the structural result proposed in this 
paper for n = 1 is effectively equivalent to the one proved in [3]. 

3. Proof of the second structural result 

In this section we prove the second structural result (|8|) . As in Section [21 we prove the result for 
K = 2 and then show how to extend it for general K. To prove the result, we reconsider the 
coordinator's problem at Stage 3 of Section [2] and present an alternative characterization for the 
coordinator's optimal strategy in Problem [2j The main idea in this section is to use the dynamics 
of the system evolution and the observation equations (equations ([I]) and (J2J)) to find an equivalent 
representation of the coordinator's information state. We also contrast this information state with 
that proposed by Witsenhausen. 
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3.1. Two controller system (K = 2) 

Consider the coordinator's problem with K = 2. Recall that 7/ and 7 2 are the coordinator's actions 
at time t. 7 t fc maps the private information of the k th controller (Y t k _ n+1 . t , U k _ n+1 . t _ 1 ) to its action 
In order to find an alternate characterization of coordinator's optimal strategy, we need the 
following definitions: 

Definition 3 For a coordination strategy ip, and for t = 1, 2, . . . , T we define the following: 

1. e t := P (AV„ I A t ) 

2. For k = 1,2, define the following partial functions of 7^ 

r m,t(') := 7m(') ^m-n+l:t--n> ^m-ra+l:t-n) ) m = t — n+l,t — n + 2, . . . ,t — 1 (39) 

Since 7^ is a function that maps (Y^_ ra+1 . m , U^ l _ n+1 . m _ 1 ) to C/^, r^ t (-) is a function that 
maps (Y" 4 *L n+1:m , C r / c „ n+1:m _ 1 ) to We define a collection of these partial functions as 

follows: 

r\ := (ri tt ,m = t - n + 1, t - n + 2, . . . , t - 1) (40) 

Note that for n = 1, r\ is empty. □ 

We need the following results to address the coordinator's problem: 

Proposition 4 L For t = 1, ... ,T — 1, i/iere exists functions Qt,Qt, k = 1,2, (which do not 
depend on the coordinator's strategy) such that 

e t+1 = Q t (G t , Z t+1 ) 

r k t+1 = Q k t (r k t ,Z t+l ^) (41) 

2. The coordinator's information state IT is a function of (@t, r t> r t)- Consequently, fort = 
1,...,T, there exist functions Ct (which do not depend on the coordinator's strategy) such 
that 

E{c t (5 t ,7^7^5m)|Ai,n 1:i , 7 L,7i 2 J =C , t (e t ,r t 1 ,r t 2 ,7 t 1 ,7 4 2 ) (42) 

3. The process (0t, r*, r 2 ) ; t = 1,2, ... ,T is a controlled Markov chain with 7/, 7 2 as the control 
actions at time t, i.e., 

P [ Q t+l,rl +1 ,tf +1 I At,9i :t ,r^,r 2 :t ,7{ :t ,7 2 :t ) 
= P (64+1, r} +1 , rf +1 I ©i :t , r\. t , r\. t , 7^, 7^) 

= P (0 t +i, r^i, r 2 +1 I t , rlrljl,^) . (43) 

□ 

PROOF See Appendix [Ql ■ 

At t = 1, since there is no sharing of information, Oi is simply the unconditioned probability 
P(Ao). Thus, 0i is fixed a priori from the joint distribution of the primitive random variables 
and does not depend on the choice of the coordinator's strategy ip. Proposition 4 shows that the 
update of Q t depends only on Z t +\ and not on the coordinator's strategy. Consequently, the belief 
©t depends only on the distribution of the primitive random variables and the realizations of Z\-%. 
We can now show that the coordinator's optimization problem can be viewed as an MDP with 
(0j, r\, r 2 ), t = 1, 2, . . . , T as the underlying Markov process. 
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Theorem 4 (0(,r|,r 2 ) is an information state for the coordinator. That is, there is an optimal 
coordination strategy of the form: 

(lll?)=M®ur},r?), t = l,...,T. (44) 

Moreover, this optimal coordination strategy can be found by the following dynamic program: 

J T (9,f\r 2 ) = inf E{c r (e r ,4,4,7r,7T) ©T = 0,4 = r\4 = r 2 ,7r = t\7t = 7 2 } • (45) 

For i = 1,. . . ,T - 1, /e£ 

J t {B,f\¥ l )= inf E jc t (G t ,r i 1 ,r 2 , 7l 1 , 7t 2 ) + J m (G m ,r t 1 +1 ,r t 2 +1 ) e t ,= 0, ll^'l^J )• 
7 >7 I it — 7 > 7t — 7 J 

(46) 

where 9 £ V{X}, and r , f 2 are realizations of partial functions defined in (I39p and (|40p . TTie 
arg in/ ( 7 t' ,7^' ) in f/ie i?if5" o/ (146p is £/ie optimal action for the coordinator at time t when 
(Qt,r},r$) = (0,r 1 ,r a ). Thus, 

The corresponding control strategy for Problem^ given by (|15p is optimal for Problem^ □ 

Proof Proposition [4] implies that the coordinator's optimization problem can be viewed as an 
MDP with (Oj, r| , r 2 ), £ = 1, 2, . . . ,T as the underlying Markov process and Ct{®t, r t, r t, It, It) 
as the instantaneous cost. The MDP formulation implies the result of the theorem. ■ 

The following result follows from Theorem HI 

Theorem 5 (Second Structural Result) In Problem^ with K = 2, without loss of optimality 
we can restrict attention to coordination strategies of the form 



where 
and 



U* = g k t (ktQ t y t y t ), k = 1,2. (47) 
@ t = P (X t _„ | A t ) (48) 
rt = {(9i(;Ym-n+l:t-n, U^ n+ht _ n , A m ),t - n + 1 < m < t - 1} (49) 



Proof As in Theorem [2j equations (]15p can be used to identify an optimal control strategy for 
each controller from the optimal coordination strategy given in Theorem HJ ■ 

Theorem |4] and Theorem [5] can be easily extended for K controllers by identifying (@t, r t' K ) as 
the information state for the coordinator. 

3.2. Comparison to Witsenhausen's Result 

We now compare the result of Theorem U] to Witsenhausen's conjecture which states that there 
exist optimal control strategies of the form: 

^^(A^Ppr^lA,)). (50) 

Recall that Witsenhausen's conjecture is true for n = 1 but false for n > 1. Therefore, we consider 
the cases n = 1 and n > 1 separately: 
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Delay n = 1 

For a two-controller system with n = 1, we have 

At = ffiVli^l-i-l.^lVl'^W-l)) 

A* = (!?), A? = (lf), 

and 

7-1 = 0, r t 2 = 

Therefore, for n = 1, Theorem [5] implies that there exist optimal control strategies of the form: 

U? = gH^,V(X t -n\At)), k = l,2. (51) 

Equation (f5T|) is the same as equation (j50j) for n = 1. Thus, for n = 1, the result of Theorem U] 
coincides with Witsenhausen's conjecture which was proved in [3]. 

Delay n > 1 

Witsenhausen's conjecture implied that the controller k at time t can choose its action based only 
on the knowledge of and P (Xt- n \ At), without any dependence on the choice of previous control 
laws {g\'^_i). In other words, the argument of the control law g\ (that is, the information state 
at time t) is separated from g\'^_i- However, as Theorem [5] shows, such a separation is not true 
because of the presence of the collection of partial functions r^,rf in the argument of the optimal 
control law at time t. These partial functions depend on the choice of previous n — 1 control laws. 
Thus, the argument of g\ depends on the choice of 9t— n +\:t—v One mav argue that Theorem [5] can 
be viewed as a delayed or partial separation since the information state for the control law g\ is 
separated from the choice of control laws before time t — n + 1. 

Witsenhausen's conjecture implied that controllers employ common information only to form a 
belief on the state Xt- n ', the controllers do not need to use the common information to guess each 
other's behavior from t — n+1 to the current time t. Our result disproves this statement. We show 
that in addition to forming the belief on Xt- n , each agent should use the common information to 
predict the actions of other agents by means of the partial functions r\,r\. 

4. A Special Case of Delayed Sharing Information Structure 

Many decentralized systems consist of coupled subsystems, where each subsystem has a controller 
that perfectly observes the state of the subsystem. If all controllers can exchange their observations 
and actions with a delay of n steps, then the system is a special case of the n-step delayed sharing 
information structure with the following assumptions: 

1. Assumption 1: At time t = 1,. . . ,T, the state of the system is given as the vector Xt '■= 
{X} ), where X\ is the state of subsystem i. 

2. Assumption 2: The observation equation of the k th controller is given as: 

Y t k = X k t (52) 

This model is the same as the model considered in [Tj . Clearly, the first structural result and the 
sequential decomposition of Section [2] apply here as well with the observations Y t k being replaced 
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by Xj?. Our second structural result simplifies when specialized to this model. Observe that in this 
model 



UK 



and therefore the belief, 



G t = P (X t -n | A t 



(53) 



(54) 



is 1 for the true realization of X t _ n and otherwise. The result of Theorem 0] can now be restated 
for this case as follows: 

Corollary 2 In Problem [7] with assumptions 1 and 2, there is an optimal coordination strategy of 
the form: 



(llW=MXt-n,rl,ri), t = l, 



(55) 



Moreover, this optimal coordination strategy can be found by the following dynamic program: 
J T (x, r\r 2 ) = inf E { C T (X T _ n , r\,r\, 7 * , 



n 2 T ) 



X T _ n = x ,r\ = f x ,r\ = f 2 ,7^ = J 1 ,^ = 7 2 } • 

(56) 



For t = 1, . . . ,T- 1, let 



JAx,f l ,f 2 ) = inf E < 



C t {X t -n , r t , r t , 7i , % ) + J t+ x (X t ^ n+1 ,r t+1 , r t+1 ) 



X t -n = X, 
i _i 9 ~2 

7t =7 1 ,7? =7 2 



(57) 



We note that the structural result and the sequential decomposition in the corollary above is 
analogous to Theorem 1 of [TJ. 



5. Kurtaran's Separation Result 

In this section, we focus on the structural result proposed by Kurtaran [3]. We restrict to the two 
controller system (K = 2) and delay n = 2. For this case, we have 



At — (^l:t-2) ^l:t-2i ^l:t-2) ^1: 



t-2^> 



and 



^ 1 = (y t ii,y t ?. 1 ,t/ t 1 _i,t^-i). 



Kurtaran's structural result for this case states that without loss of optimality we can restrict 
attention to control strategies of the form: 



U«=g k t {k k t M), k = 1,2, 



(58) 



where 

$ t := P 9 (Xt-a.C/tLi.OtilAt). 

Kurtaran [3] proved this result for the terminal time-step T and simply stated that the result for 
t = 1, . . . , T — 1 can be established by the dynamic programming argument given in [8]. We believe 
that this is not the case. 
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In the dynamic programming argument in 0, a critical step is the update of the information 
state 3>t, which is given by [U Eq (30)]. For the result presented in p£], the corresponding equation 
is 

$t+i = Ft^Yl^Yl^U^UlJ. (59) 

We believe that such an update equation cannot be established. 

To see the difficulty in establishing ()59|) . lets follow an argument similar to the proof of [HJ Eq (30)] 
given in [8, Appendix B]. For a fixed strategy g, and a realization St+i of At + i, the realization tpt+i 
of $>t+i is given by 

V?t+i = P (xt-i^jit? | 6 t +i) 

= P (x t _i , u] , u\ | 5t, y\_ x , y t 2 _ x , u^j , u 2 _i) 

= P (xt-u u\ , m| , y f 1 _ 1 , 2/g_x, ug_i | gt) ^ go ^ 

£ P(*t-i = U$ = a\ Uf = a?,y\_ x , yl_ x ,u\_ x ,u\_ x \ 5 t ) 

(i',« 1 ,a 2 )€A'xK 1 xU 2 

The numerator can be expressed as: 

P (s t _i, uj , u\ , yl_ 1} y?_±, u\_ x , u 2 _ x | 5t) 

= ^ Pr(z t _i, t4i «t > 1/t-ii I^-n«t-n«t-i>a?t-2, yhvt\ S t) 

■ P (xt_! I Xt-2,t4-l,«t-l) • 1 fl t 1 _ 1 (* t - 1 ,« t L 3 ,l/ t 1 _ 3 ,» t 1 _ 1 )[ U t-l] • 1 3?(<5 t -i,«?_2.S/?-2^ t 2 -i)[ n *-2] 

• P (i,^! I x t _ 2 ) • P (%-i I xt-a) • P (xt-2 I <$t) (61) 

If, in addition to tpt, y\_ x , 2/f_i, an d u\_ x , each term of (foTj) depended only on terms that are 
being summed over (x t _2, yj, y\ ), then (|6ip would prove (follj) . However, this is not the case: the 
first two terms also depend on St- Therefore, the above calculation shows that <pt+i is a function 
of (ft, Yf*_ x , Y^_ x , Ul_ x , Uf_ x and St- This dependence on St is not an artifact of the order in which 
we decided to use the chain rule in (|6ip (we choose the natural sequential order in the system). No 
matter how we try to write (pt+i in terms of <pt, there will be a dependence on St- 

The above argument shows that it is not possible to establish (j59|) . Consequently, the dynamic 
programming argument presented in [8] breaks down when working with the information state 
of [1], and, hence, the proof in [3| is incomplete. So far, we have not been able to correct the proof 
or find a counterexample to it. 

6. Conclusion 

We studied the stochastic control problem with n-step delay sharing information structure and 
established two structural results for it. Both the results characterize optimal control laws with 
time-invariant domains. Our second result also establishes a partial separation result, that is, it 
shows that the information state at time t, is separated from choice of laws before time t — n + 1. 
Both the results agree with Witsenhausen's conjecture for n = 1. To derive our structural results, 
we formulated an alternative problem from the point of a coordinator of the system. We believe 
that this idea of formulating an alternative problem from the point of view of a coordinator which 
has access to information common to all controllers is also useful for general decentralized control 
problems, as is illustrated by [9] and [ID] . 
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A. Proof of Proposition [2] 

Fix a coordinator strategy ip. Consider a realization 6t+\ of the common information Af+i. Let 
(7rt>7i-t) De the corresponding choice of partial functions until time t. Then, the realization irt+i 
of Ht+i is given by 

TTt+ifo+i) = [S t+l = at+i | ^+i,7L,7i:t) • (62) 
Using Proposition Q] , this can be written as 

£ l St+1 (/ m ( St ,^,^ 1 +1 ,^ 2 +1 ,7 t 1 ,7')) • P (Vt = vt) ■ P = w} +1 ) 

■ P «j = rf+i) ■ P^ = s t | 7i t , f 1:t ) . (63) 
Since <5t+i = (St, £t+i), the last term of (f63|) can be written as 



roV/Q |X -1 ~M F (5 f = Sf,Z f +l = Z t+1 <5t,7i:f.7l:t) 

P V (5t = S t St,Zt+l,7l:t,7l:t) = ^ ^ / — 1 zi _ 2 r - 

P v (S t = s', Zt+i = Zt+l Ot, 7l:t> 7l:t) 



(64) 



We can use (|18p and the sequential order in which the system variables are generated to write 
P^ (St = s t ,Z t+1 = z t+1 \5 t ,ri :t ,ft t ) 

Substituting ([65]) . (f64"|h and ([63]) into ([62]) . we can write 

7Tt+i(st+i) = F t+1 (7r t ,j},rf,z t+ i)(s t+1 ) 

where F t +i(-) is given by ([62]), §3J, ([MD, and (165|) . 



B. Proof of Proposition [3] 

Fix a coordinator strategy -0. Consider a realization #t+i of the common information Af+i. Let 7ri : t 
be the corresponding realization of IIi : f and (7 1:t , Ji :t ) the corresponding choice of partial functions 
until time t. Then, for any Borel subset A C V {S}, where V {S} is the space of probability mass 
functions over the finite set S (the space of realization of St), we can write using Proposition [2] 

P (n m e A\ St,ir 1:t ,jl. t ,jl t ) =^1^(^+1(7^,74,71,^+1)) • P (Zt+i = zt+i I S t ,TT 1:t , 7it,7it) 

zt+l 

(66) 

Now, using (fTSj) . we have 

P (Zt+1 = Z t +l I 5t, 7Tl : t, 7j. t , 7f. t ) =X) 1 ftt(st)^ t + 1 ) ' P O 8 * = S * I ^' ^'Tl^Tltt) 

Si 
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Substituting (|67|) back in (|66|) . we get 

P (Il m e A I dt,7ri !t ,7£ t ,T?. t ) = ^^1^(^+1(^,74,7^^+1)) ' ^h t (s t )( z t+i) ■ ^(st) 

zt+l s t 

= P(n t+1 e^| 7 r tj 7 t 1 ,7 t 2 ), (68) 



thereby proving (|22|). 

Now, using Proposition Q] we can write, 

JE{c t 0%,7 t \7 t 2 ,St + l) I ^,7Tl:t,7l:t,7l : i} 

= X] 7i\ 7t > /m(«t, "it w t+lJ 7ti 7?)) • p = u t) 

• P (W^i = <i) • P {W t 2 +1 = w 2 t+1 ) ■ P (Si = s t I <5 t , 7r 1:tj 7L, 7i 2 *) 
= X] ^C 8 *' 7t> 7t > Af i(«t> ^, Wt+n 7t\ 7t 2 )) ■ P W = «t) 

• P (WtVi = ^+1) • P = «; 2 +1 ) • 7r t ( at ) 

= - Ct ("tj 7t > 7t )■ (69) 

This proves (|23l) , 

C. Piecewise linearity and concavity of value function 

Since C t (Ilt,7i,Jt) = E {ct(<St>7t ,7t> <Si+l) | A t , IIi :f , 7^, 7 2 t }, the value function at time T can 
be written as, 



J t (tt) = inf E{c t (St,7 ,7 , S T +i) | n T = ir, 7 T = 7 ,7 T = 7 }. (70) 

7 .7 

For a given choice of 7 1 , T -2 , the expectation in equation ()70|) can be written as: 

. P (y T = v T , w£ +1 = vk + i,W% +1 = w 2 T+1 ) ■ tt(st) (71) 

The expression in (17ip is linear in ir. Therefore, the value function Jt{k) is the infimum of finitely 
many linear functions of %. Hence, Jt(tt) is a piecewise-linear (and hence concave) function. We 
now proceed inductively. 

First assume that Jt+i(7r) is a concave function. Then, Jt+i can be written as infimum of a 
family of affine functions. 

Jt+i(ir) = inf Vdi(a) ■ tt(s) + 6i, (72) 
j * — * 

where a«(s), s £ 5 and 6j are real numbers. The value function at time t is given as: 
J t (7r) = mf 2 [Elct^^T 2 ,^) |n t = vr.Tt 1 = 7 1 ,7 2 = 7 2 } 



7 ,7 



+ E{ J t+ i(n m ) I U t = 7T, 7! = f,7 2 = f} ] (73) 
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For a given choice of 7,7, the first expectation in (J73J) can be written as 



^2 hist,! 1 ,1 2 Jt+i{st,v t ,w\ +1 ,wl +1 ^ 1 ,^ 2 )) 

P (V t = v t , W} +1 = w] +1 ,W 2 +1 = w 2 t+1 ) ■ 7T(s t ) 



(74) 



Thus, for a given choice of 7 X ,7 2 , the first expectation in (I73p is linear in n. Using Proposition [2j 
the second expectation in (|73p can be written as: 

E{j m (F m (n t ,7\7 2 ,Z m )) I n t = 7r, 7t = = f} 

= Y, Jt+i(F t+ i(ir, 7 1 , 7 2 , • P (^t+i = «t+i I n t = 7T, 7* 1 = = f) 



inf { Y ai ^ ■ ( i ^+i(7r > 7 1 I 7 2 ,^+i))(s) + &i} 



P = Z t+1 I II t = 7T, 7i = T, it = f) 

(75) 



We now focus on each term in the outer summation in (|75p . For each value of Zt+i, these terms 
can be written as: 

inf { £>;(s) • (Ft+iCTT.^.T 2 ,^!))^) • P (Z m = z t+1 I n t = 7r, 7t = 7 X ,7l = 7 2 ) 

s 

+ 6, • P (Z t+1 = z m I n t = vr, 71 = 7 1 , 7 2 = 7 2 ) } (76) 



We first note that the term bi ■ P (Zt+i = 2t+i | IT = 7r,7* = 7 X ,7 2 = 7 2 ) is affine in 7r. This 
because: 



is 



h ■ P (Zt+i = 2t+i I n t = vr,^ 1 = 7 1 , 7 2 = f) = 6, • ]T l^ (s n(^+i) ■ 7r(s') 



(77) 



s'es 



Moreover, using the characterization of i*t+i from the proof of Proposition [2] (Appendix [A]), we can 
write the term with coefficients aj(s) in (|76p as 



■P{V t = v t , W} +1 = wl +1 ,W 2 +1 = w 2 +1 ) ■ t h{st) (z t+1 )ir(s t ) 



(78) 



which is also affine in n. Using equations (J76J), (I77| and (1781) in (l75j) . we conclude that for a given 
choice of 7 , 7 2 , the second expectation in (f73|) is concave in 7T. Thus, the value function Jt(7r) is 
the minimum of finitely many functions each of which is the sum of an affine and a concave function 
of 7r. This implies that Jt is concave in ir. This completes the induction argument. 
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D. Proof of Proposition [4] 

1. Recall that Z t +i = (Yp_ n+1 , Yf_ n+1 , C/f_ n+1 , Uf_ n+1 ) and A t+ i = A t U Z t +\- Fix a coordina- 
tion strategy tp and consider a realization of At+i- Then, 

0t+l(%t-n+l) := = %t-n+l\5t+l) 

= lP(^t-n+l = x t-n+l\$t,yt-n+l'yt-n+l> u t-n+li u t-n+l) 

= ^ ^{Xt-n+l = X t _ n+ i|X t _„ = SCjtit-n+l'^-n+l) ( 79 ) 

• ^{Xt- n = x\5t,yl_ n+ i,yt_ n+ i,uj_ n+1 ,uf_ n+1 ) 

= Y V(X t - n +l = X t ~ n +l\Xt-n = X , u}_ n+1 , uf_ n+l ) (80) 

P(-X*-n = a;,%_ n -fi)y|_ n +i)^t_ n +i)«|_ n 4-i|i5t) 



Si' F(X t _ n — x '> Vt-n+V Vt-n+1 ' U t-n+l ' U t-n+l \$t) 



(81) 

Consider the second term of (|8ip . and note that under any coordination strategy ip, the 
variables u^_ n+1 , u1_ n+l are deterministic functions of y\_ n+1 , yf_ n+1 and ^ (which is same 
as y^t-ni u i$-n)- Therefore, the second term of (f8Tj) can be written as 

^(W-n+l^t-n+ll^-n+l.i/t-n+l.^) ' Pfo-n+l L jg =g± l l**-n = *) ' P(*t-n = x\5 t ) 

Z x >^(uL n+1 ,ul n+1 \yl n+1 ,yl n+1 ,6 t ) ■ F(yl n+1 ,yl n+1 \X t - n = x>) • P(Z t _ n = a/^) 

ny l t-n+l\Xt-n = X) ■ I>{y 2 t-n+l\Xt-n = x) • fl f (s) 
E,'I > (^ 1 -„ + ll^-n = ^)-lP(y i 2 -n + ll^-n=^)-^(^) 

Substituting ([82]) in ([8T]) . we conclude that 6t+i is a function of 9 t and £t+i- 
Consider next r^ +1 := (r^ , t+1 * , t— n+2 < m < t). For m = i, we have ^( t+1 ) : = 7$(*j ^t- n +i)- 
Since Y^L n+1 is a part of Zt+i, therefore ?"f( t+1 ) is a function of 7 t fc and Z t +\- Also, for 
m = t — n + 2,t — ra + 3, . . . ,t — 1, 



r fc M ■= -v fc f. V fc 77 fc "I 

' m,t+l \ I ' Im\ ' - 1 m— 7i+l:t+l— n> ^m— n+l:t+l— n/ 

k ( -yk jjk yk jjk \ 

= r m.i(') ^t-n+l) (83) 

Thus, for m = i — n + 2, £ — n + 3, . . . ,t — 1, t+1 is a function of t and Z t +\. 

2. We will first show that the coordinator's belief IIj defined in (|20j) is a function of (0t,rj ) r t)- 
That is, there exist functions -fft, for t = 1, 2, . . . , T, such that 

n t = J ff t (0 t ,r t 1 J rf) (84) 

Using this fact with using equation (|23p from Proposition [3] we can conclude that 

M{c t (S t , 7 ljf,S t+1 )\A t ,U 1 , t , 7 l :t , 7 l t }=C t (U t , 7 l,^) 

= C t (e t ,rl,rlri,^) (85) 

where we use the fact that Ht is a function of {Qf> r ti r t) m equation (|85p . In order to 
prove ([84]) . we need the following lemma: 
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Lemma 1 St '■= (Xt_i,A*,A 2 ) is a deterministic function of (Xt- n , Vt- n +i-t-i,Wp_ n+1 . t , 
n+i:ti r ti r t)- That is, there exists a fixed deterministic function D t such that 

S t := (X t ^,Al,A 2 t ) = D t (X t - n , Vt- n+ i; t -i,Wl_ n+1 . t , W 2 _ n+1 . u rj,r 2 ) (86) 



□ 



PROOF We can reconstruct (X t _ n+ i :t _i, Aj, A 2 ) from (X t - n ,Vt- n +i:t-i,Wl_ n+1 . t , 
Wt-n+l-.t' r t > r t ) using the given dynamics of the system ([I]), the observation equation ([2]) and 
the definition of r\ in a straight forward manner. Firstly note that 



{X t - n +l:t-l, Al, A|) — (-X|-n+l:t-l!^t-n+l:t!^t-n+l:*-l) ( 8 7) 

We first look at the random variables (X t -n+l, Yt-^i+n ^t-n+i)- We have, for k = 1, 2, 

i+i) 



Uf-n+l — r t-n+l,tO^-n+l) 

(88) 

Further, by the system dynamics, 

-Xt-n+l = ft{Xt-ni U t ^ n+1 , Vt-n+i) (89) 
Thus (Jf t _ n+ i, l^! 2 + i, ^t-n+i) is a deterministic function of (X t - n , W^i 2 n+1 , V t - n+ i,rl^ n+lt ). 



t—n+l: 



Now assume (X t - n +l:m, *t-n+l:m> U t-n+l:m) is a function of (A t _ n , W£? n+Um , V t - n+ i- m , 
We have shown above that this is true for m = t — n+l. Then, for m = t — n + 1 : t — 2, 

Jfk _ k (yk jjk \ 

u m+l ~ ' m+ljV 1 t-n+l:m+l> u t-n+V.m) 

Further, by the system dynamics, 

-Xot+i = ft{X m , U^l +1 , V m +i) (90) 
Thus, (X m+ i, C^i+l) ^ s a deterministic function of 

(^mi ^t-n+l:m> ^t-n+l:mi Wjn+l' HrH-l> r m+l,i) 

Combining this with our induction hypothesis, we conclude that (Xt_ n +i :m +i , l^^ +1:m+1 , 
tf/i 2 n+1:m+1 ) is a function of (X t _ n , T^i 2 n+1:m+1 , TS_n+l:m+l, r t-n+l:m+l,t)- Thus ' b y induc " 
tion we have that 

(Xt-n+ut-l, Y t l n+1 . t _ 1: U t L ri+l . t _i) 

is a function of 

{X t -n, Wt-n+l:t-l! ^t-n+l:t-l, »"t-n+l:t-l,t) 
Finally noting that = foj(Xt_i,W t ) and that r t fc = rf_ n+1:f _ 1>t , we can conclude that 
there exists a deterministic function Dt such that 

(Xt-n+l:t-l,Yt-n+l:ti ^t-n+V.t-l) = D t(X t - n , Vt- n +l:t-l > ^/-n+lii J W*-n+l:ti r h r t ) (91) 

This implies the existence of functions such that 

5 f := (Jf t _i, A, 1 , A 2 ) = D t (X t - n , V t - n +i:t-i,Wl_ n+1 . t , W 2 __ n+1 , t , rj , r 2 ) (92) 
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Now consider 

U t (s t ) := P^ (5 t = St |A t , 7l Vi,7lVi) 

= 1 st{ D t(xt-n,Vt-n+l:t-l,M-n+htirl,r$)} 



Xt-n,Vt-n+l:t-l, 
W 1:2 

fin 



• P (x t -n, Vt-n+lxt-l, wl-n+l:t, ff | A t , 7l:i-l) (93) 

Note that , are completely determined by A^ and 7i?|_i and the noise random variables 
vt-n+v.t-i, w t-n+i-.t are independent of the conditioning terms and Xt- n - We can therefore 
write (f93|) as 

^IsjA^t-n^t-n+lit-l^tin+l:^^ 1 '^ 2 )} ' P («t-n+l:t-l> wl-n+l:t) 

■ l f i# (r t Vt ) ■ P [xt-n | A t , 7i: t -n7i:t-i) (94) 



In the last term of (|94p . we can drop 7i:f— i fr° m the conditioning terms since they are 
functions of A^. The last term is therefore same as P (xt- n \ At) = ®t- Thus, IT is a function 
of @t and r\ , r\ . 



3. Consider the following probability: 

P (e t+ i = O t+1 ,rj +1 = fj +1 ,rf +1 = ff +1 | 5 t ,9 1:t ,^,fl :t ,rl t ) 
= Y, t O t+1 {Qt + i{e u z t+ i)) ■ t fl+i {Q l t+l {rl^lzt +l )) 

Zt + l 

• lr hl (Q 2 t + i(rlllz t+ i)) • P (Zt+i = z t+ i I 5t,7ii 2 ,^ :t ,f? !t ) (95) 
The probability in equation (|95p can be written as: 

P (Z t +i = zt+i | <5t, 7i:l A, rlt) 

= T, 1 h t ( st )( z t+i) 'F(S t = s t \ StAlriurlt) 



St 



^kistM+i) ^ (St = st \ S t ) 

St 
St 

XX^te+i) • H t (6 u f],r 2 t ){s t ) (96) 



St 



Substituting ([96]) back in ([95]) . we get 

P (e t +i = O t+1 ,rl +l = r] +l ,r1 +l = rf +1 | 8u0ixulY%,r\. t ,r{. t ) 
= ]T M +1 (Qt+i(Ot,zt +1 ))-l f iJQ 1 t+1 (rl^,zt +1 )) 

Zt+l,St 

■ l f .2 +i (Q 2 t +1 {flft,z t+l )) ■ t ht(st) (z t +i) • H t (e t ,rlff)(s t ) 
= P (9 t+1 = e t+i y t+ i = rl+iAi = r 2 t+l | 9 t , rlrl (97) 
thereby proving (fl3|) . 
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